Search results for "log analysis"

showing 6 items of 6 documents

Improving clustering of Web bot and human sessions by applying Principal Component Analysis

2019

View references (18) The paper addresses the problem of modeling Web sessions of bots and legitimate users (humans) as feature vectors for their use at the input of classification models. So far many different features to discriminate bots’ and humans’ navigational patterns have been considered in session models but very few studies were devoted to feature selection and dimensionality reduction in the context of bot detection. We propose applying Principal Component Analysis (PCA) to develop improved session models based on predictor variables being efficient discriminants of Web bots. The proposed models are used in session clustering, whose performance is evaluated in terms of the purity …

Bot detectionPrincipal Component AnalysisPCALog analysisComputer sciencek-meansInternet robotcomputer.software_genreClassificationWeb botDimensionality reductionClusteringWeb serverPrincipal component analysisFeature selectionData miningCluster analysiscomputerCommunications of the ECMS
researchProduct

Modeling a non-stationary bots’ arrival process at an e-commerce Web site

2017

Abstract The paper concerns the issue of modeling and generating a representative Web workload for Web server performance evaluation through simulation experiments. Web traffic analysis has been done from two decades, usually based on Web server log data. However, while the character of the overall Web traffic has been extensively studied and modeled, relatively few studies have been devoted to the analysis of Web traffic generated by Internet robots (Web bots). Moreover, the overwhelming majority of studies concern the traffic on non e-commerce websites. In this paper we address the problem of modeling a realistic arrival process of bots’ requests on an e-commerce Web server. Based on real…

Web serverGeneral Computer ScienceComputer scienceInternet robotReal-time computing02 engineering and technologyE-commercecomputer.software_genreSession (web analytics)Theoretical Computer ScienceWeb traffic characterizationWeb serverWeb traffic0202 electrical engineering electronic engineering information engineeringTraffic generation modelWeb traffic analysis and modelingbusiness.industryComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS020206 networking & telecommunicationsWeb botHeavy-tailed distributionModeling and SimulationHeavy-tailed distribution020201 artificial intelligence & image processingThe InternetWeb log analysis softwareLog file analysisData miningbusinessRegression analysiscomputerJournal of Computational Science
researchProduct

Verification of Web traffic burstiness and self-similarity for multiple online stores

2017

Developing realistic Web traffic models is essential for a reliable Web server performance evaluation. Very significant Web traffic properties that have been identified so far include burstiness and self-similarity. Very few relevant studies have been devoted to e-commerce traffic, however. In this paper, we investigate burstiness and self-similarity factors for seven different online stores using their access log data. Our findings show that both features are present in all the analyzed e-commerce datasets. Furthermore, a strong correlation of the Hurst parameter with the average request arrival rate was discovered (0.94). Estimates of the Hurst parameter for the Web traffic in the online …

Web serverSelf-similarityComputer scienceSelf-Similarity02 engineering and technologyE-commerceWeb trafficcomputer.software_genreE-Commerce01 natural sciences010104 statistics & probabilityHurst parameterWeb trafficWeb server0202 electrical engineering electronic engineering information engineeringRange (statistics)Web storeBurstiness0101 mathematicsLog analysisbusiness.industry020206 networking & telecommunicationsHurst indexBurstinessHTTP trafficbusinesscomputerComputer network
researchProduct

Practical Aspects of Log File Analysis for E-Commerce

2013

The paper concerns Web server log file analysis to discover knowledge useful for online retailers. Data for one month of the online bookstore operation was analyzed with respect to the probability of making a purchase by e-customers. Key states and characteristics of user sessions were distinguished and their relations to the session state connected with purchase confirmation were analyzed. Results allow identification of factors increasing the probability of making a purchase in a given Web store and thus, determination of user sessions which are more valuable in terms of e-business profitability. Such results may be then applied in practice, e.g. in a method for personalized or prioritize…

Web serverService (systems architecture)DatabaseComputer sciencebusiness.industryE-commercecomputer.software_genreWorld Wide WebIdentification (information)Web pageWeb log analysis softwareWeb servicebusinesscomputerData Web
researchProduct

Syslog-protokollan viestien analysointi järjestelmän vianetsinnän apuna

2017

Tietojärjestelmät keräävät toiminnastaan jatkuvasti lokitietoja. Vikatilanteissa lokitietoja voidaan hyödyntää virheen paikantamisen apuna. Lokinkirjoittamisen teollisuusstandardiksi on noussut Syslog-protokolla. Tarve protokollaa varten kehitetyille lokianalyysitekniikoille ja -työkaluille on näin ollen noussut. Tässä kandidaatintutkielmassa pyritään esittelemään näitä analyysitekniikoita ja -työkaluja, ja pohtimaan näiden hyödyllisyyttä järjestelmän vianetsintää suorittavan järjestelmänvalvojan näkökulmasta. Information systems constantly gather operational log data. In case of system failure, this log data can be used as a troubleshooting tool in locating the problem source. The Syslog p…

system administrationtroubleshootingjärjestelmänvalvontaSysloglokianalyysilog analysisvianetsintä
researchProduct

Adaptive framework for network traffic classification using dimensionality reduction and clustering

2012

Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log files. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then …

ta113Computer scienceNetwork securitybusiness.industryDimensionality reductionintrusion detectionk-meansdiffusion mapServer logcomputer.software_genreanomaly detectionTraffic classificationkoneoppiminenWeb log analysis softwareAnomaly detectionData miningWeb servicetiedonlouhintaCluster analysisbusinesscomputern-grams
researchProduct